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Abstract 

We present a complete specification of a new benchmark for measuring the perfor- 
mance of modem computer systems when solving scientific problems featuring irregular, 
dynamic memory accesses. It complements the existing NAS Parallel Benchmark suite. 
The benchmark involves the solution of a stylized heat transfer problem in a cubic domain, 
discretized on an adaptively refined, unstructured mesh. 


1 Introduction 

The NAS Parallel Benchmarks (NPB) [1. 2] were originally formulated to measure the perfor- 
mance of high-performance computer systems, especially parallel machines, when applied to 
computational problems of importance to the scientific community in general, and to NASA 
in particular. Despite the relatively limited scope of the eight problems that made up NPB, 
the benchmarks became quite popular and have been widely used by researchers, computer 
vendors/buyers, and software tool developers. Since their initial release as paper-and-pencil 
specifications in 1991, and as source code implementations (in MPI) in 1995 [2], it has be- 
come increasingly clear that the NPB problems lack in the area of irregular and dynamically 
changing memory accesses, which may defeat the hardware and software support for mem- 
ory traffic in modem computer architectures. The original NPB featured applications with 
non-varying, (mostly) fixed-stride memory access, which can be exploited by compilers and 
specialized hardware to reduce the cost of memory traffic. Not all applications of importance to 
NASA have such simple, predictable memory accesses, however, and it was found in a study by 
Oliker and Biswas [8] that some strongly dynamic applications fare poorly when implemented 
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on certain widely used parallel computers. This situation is exacerbated when the program- 
ming environment selected offers limited control over data placement and granularity of data 
traffic, as is often the case with distributed shared memory systems. 

The new benchmark specified here, to be incorporated in the NPB suite, aims to provide 
a standardized method for gauging the performance of computer systems when running scien- 
tific applications whose memory access patterns are irregular and continually changing. The 
motivation for the specific design choices that led to the problem specification were presented 
in a previous report [3]. 

1.1 Governing equation 

The mathematical model for the heat transfer problem is as follows: 

T t + v • VT = eV 2 T + S(x, t) , (1) 

where T is temperature; v is the flow velocity; e is the heat diffusion coefficient; t is time; and 
S(x. t ) is a source term defined by 

f cos(j||x - x 0 - vf|i) + 1 if j|x - x 0 - vt\\ < a 
S(x,t) = < (2) 

[ 0 if ||x — x 0 — v£|| > a 

Here jj • || signifies the Euclidian norm. The location vector x is defined by (x, y, z), and x 0 
is the initial location of the center of the source. The initial temperature distribution on the 
domain is zero, and the Dirichlet boundary conditions are fixed at zero as well. The prescribed 
velocity field v = {u. v, w) is uniform and constant, and equals the speed of the source. 

Thus, it is known explicitly when and where mesh refinement and coarsening are required. 
Fine mesh cells are prescribed where large temperature gradients exist, i.e. near the traveling 
source. After the source has passed a certain region, high resolutions are no longer needed 
there, and grid coarsening is applied. Even though the elements are rectangular, the circular 
shape of the source results in a nonuniform grid, see Section 4. 

1.2 Temporal discretization 

Temporal discretization is based on an operator splitting technique developed by Maday, Pat- 
era, and Rpnquist [5], The two-step time splitting scheme for time integration of the governing 
equation is: 


rpn± 1 
_ /j'n-r 1 


At 


RK 4 (— v- VT n + S(x,t)) 
eV 2 T n+1 


(3) 

(4) 
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The explicit, fourth-order Runge-Kutta method, RK 4 , in Eq. (3) is defined as follows. 

fn+l = j,n + 1 ^ + 2k . 2 + 2k 3 + k 4 ) 

h = Af(— v • VT n + S'(x.f)) 

h = if(-v-v(r + |) + s(x,(4-y)) (5) 

= At(-v ■ V(T" + |) + S(x, t + ^)) 

^4 = Af (— v ■ V (7 '* -f- k^) + S (x, f + At)) 

The Euler Implicit method used for the time advancement of the diffusion term (4) leads 
to a large system of equations after spatial discretization. It is solved using a Preconditioned 
Conjugate Gradient (PCG) method, described in Section 3.3. 


2 Grid system 

For spatial discretization of the problem we use the Spectral Element Method (SEM) [9], a 
high order technique for solving partial differential equations. It is a hybrid of spectral and 
finite element methods, combining their respective high accuracy and geometrical flexibility. 
In this benchmark we use a non-conforming, adaptively refined grid that changes as the heat 
source travels through the domain. Details about grid adaptation are presented in Section 4. 
Non-conformity of the grid is restricted; a grid cell— or element - is refined by dividing all its 
sides by two, and if two elements have a common face or edge, the ratio of the lengths of their 
sides is at most two. Hence, if two elements share a face, the faces are either equal in size, cl- 
one is four times the size of the other. Simple examples of a conforming and a non-conforming 
grid are shown in Figure 1 . 
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Figure 1 : (a) Conforming grid, (b) Non-conforming grid 
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2.1 Collocation points 

Each element is covered by a Cartesian product of (N + l) 3 collocation points, where N is 
the order of the Legendre-based polynomials used to evaluate the interpolated solution on the 
whole element. For this benchmark N is fixed at four for all elements. There are collocation 
points on all element boundaries. Their locations within each element are those corresponding 
to Gauss-Lobatto-Legendre (GLL) quadrature, see Section 3.2. If the grid has K elements, 
the total number of collocation points is (N + 1) 3 K = 125 K. Where elements meet, some 
of the collocation points belonging to the different elements coincide in space — they are not 
topologically unique. The collocation points are used for the advancement of the convection 
as well as the diffusion term. For each such point an elemental contribution is computed, as 
defined in Section 3, which can be evaluated locally using just the collocation points within a 
single element. 

2.2 Grid points 

The purpose of SEM is to compute a single solution value for each unique point in space 
contained in the grid system (a grid point). Each such value is called a degree of freedom. 
There are always more collocation points than grid points, except for a trivial grid containing 
a single element. For example, a conforming grid of M x M x M elements has (M (N 4- l)) 3 
collocation points and {MN + l) 3 grid points. For a non-conforming grid the number of 
degrees of freedom depends on the particular grid configuration. Each grid point coincides 
with one or more collocation points, but not all collocation points coincide with a grid point. 
Where a coarse and fine element meet along an edge or face, only the collocation points on the 
edge or face of the finer element are included as grid points. Figure 2 shows an example of the 
configuration of grid points and collocation points for a simple grid. 

2.3 Mappings 

In the course of the solution process it is necessary to map values from collocation points to grid 
points, and vice versa. The former is called a gather, and the latter a scatter. These operations 
form the basis of the Mortar Element Method (MEM) [6]. Both gather and scatter are the 
identity operation (copy) for points in element interiors, and for points on grid boundaries. In 
that case a grid point coincides with a single collocation point. Below we define the mappings 
for all other points in the grid, all of which are on interfaces between elements. 

2.3.1 Scatter 

Conforming face interior, edge 

If all grid points in the interior of a face of an element coincide with collocation points 
of that element, the face interior is conforming. If all grid points on an edge of an element 
coincide with collocation points of that element, the edge is conforming. 
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Collocation 

points 

(front plane) 


Grid points 
(front plane) 


Figure 2: Collocation and grid points for simple grid 


The scattering consists of copying values at grid points to the coincident collocation points 
in the conforming loci defined above. 

Non-conforming face 

A face of an element is non-conforming if neither the face interior nor its four edges are 
conforming. 

The scattering of values satisfies a jump condition that minimizes the difference in values 
across the interface in an integral sense [6]. This is accomplished as follows. Let the grid points 
that lie on the face of the coarse element be indicated by (m, k).m = 1, . . . , 2 N + 1. k = 
1, . . . , 2 N 4- 1, and let the quantity to be scattered be called <?. The scattered values <f \j at 
collocation points on the coarse-element side of the interface are: 

2N’+1 2A r +l 

r I ) i J — QimQjkOmk , i, j = 1, ■ ■ . , N + 1 . (6) 

m= 1 k= 1 
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This scatter operation to the collocation points of a single element face can be viewed as a col- 
lection of one-dimensional scatterings in two respective coordinate directions. For the deriva- 
tion of the one-dimensional scatter matrix Q we refer to [6]. Its elements (transposed) are 
listed in Table 1. It follows from the shape of Eq. 6 that the order in which the one-dimensional 
scatterings are earned out does not matter. 

Non-conforming edge 

An edge of an element face is non-conforming if it is adjacent to a conforming face interior, 
but is itself not conforming. In the scattering to a non-conforming edge, either ikm or j&k in 
Eq. 6 are fixed. The fixed value is 1 or A r + 1 for either i or j, and 1 or 2N + 1 for the other 
fixed index, depending on the location of the edge. 


Qij 

i—1 

i = 2 

i = 3 

i — 4 

i = 5 

7 ~ 1 

1.0. 

-0.1772843218615690 

9.375E-2 

Q29 

0.0 

3 = 2 

0.0 

0.7152146412463197 

-0.2285757930375471 

Q29, 

0.0 

7-3 

0.0 

0.4398680650316104 

0.2083333333333333 

Q27 

0.0 

7 = 4 

0.0 

8.333333333333333E-2 

0.3561799597042137 

Q26 

0.0 

7=5 

0.0 

0.0 

0.140625 

Q25 

0.0 

7 = 6 

0.0 

-4.854797457965334E-2 

Qu 

Q 24 

0.0 

7 = 7 

0.0 

-5.891568407922938E-2 

Q 33 

Q -23 

0.0 

7 = 8 

0.0 

8.333333333333333E-2 

Qsi 

Q22 

0.0 

7 = 9 

0.0 

-3.7001 39242414530E-2 

Qzi 

Q2I 

1.0 


Table 1 : One-dimensional scatter matrix Qij for N = 4 


The global scattering operation from the grid points to all collocation points can be written 
as: R c 4— #R S , where 9 is the global scatter matrix. 

2.3.2 Gather 

The global gathering of residuals from the collocation points to all grid points simply involves 
the transpose of the global scatter matrix: R ? -(— # r R c [7], Use of the ( 9 T , 0)-pair is critical 
for the solution of the linear system of equations for advancing the diffusion term, since it 
guarantees symmetry and positive definiteness of the overall discretization matrix. 

The gathered value of the residual at any grid point consists of the sum of all contributions 
from neighboring elements, i.e. the sum of the results of all relevant local gathering operations, 
to be defined below. This is termed direct stiffness summation. 

Conforming face interior, edges 

On conforming face interiors and edges, the value at a grid point is the sum of the residuals 
at all coincident collocation points. 
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Non-conforming face 

On a non-conforming face, the contribution 5r mk of the elemental residuals $ at collocation 
points to the residual at a grid point (to, k) that lies on the same face is: 

N+l N + 1 AO-1 N+l 

mk ~ E E = E E QimQjk^ij, TO, k = 1, . . . , 2N + 1 . (7) 

i — 1 j = 1 i — 1 j=l 

Non-conforming edge 

For the computation of elemental contributions to the residuals at grid points on non- 
conforming edges we again use Eq. 7, but either i&m or jSzk are fixed. The fixed value is 

1 or A r + 1 for either i or j, and 1 or 2 N -1- 1 for the other index, depending on the location of 

the edge. 


3 Spatial discretization 


Within each element the dependent variable, temperature T, is approximated by: 


N+ 1 N+l N+l 

T(r,s,t) ~ 5] EE 

T i j k h i (r)h j (s)h k (t) 


(B) 

i—1 j= 1 k—l 




hp(£) = | 1 

if 

(1 -f 2 )X' v (f) :f 


(9) 


— . I'J it/ til W T U1LUV V/l Uiv Ll^l III IL.I mill Ij 


temperature at collocation point (i, j, k) within the element; is the 


p th GLL collocation point; r, s, and t are local coordinates that sweep out the cubical element; 


mapping from global to local coordinates is as follows; 


X — Xi 
r = 

x 2 ~ Xi 


x 2 - 1, 


y-y i 
1/2 — 2/1 


x 2 - 1, 


t = - — — x 2 — 1 (10) 

2 2 - Zi 


where (xj, yi,Zi) and (x 2 , y 2 , z 2 ) are the coordinates of the diagonally opposite corner points 
of the element with the smallest and largest coordinate values, respectively. The interpolating 
polynomial h, which is a function of the N th order Legendre polynomial, L N , and its derivative, 
L' n , has the property that h p (£ q ) = 8 pq for p, q e {1, ..., N + 1}, where 8 pq is the Kronecker 
delta. We also note that L' N (£ q ) = 0 for q G {2, .... N}. 


3.1 Advancing the convection term 

At the start of each time step the values of T at the grid points are scattered to the collocation 
points. Eq. 5 is used to advance the convection term using only information local to the ele- 
ment. We use the spatial discretization of T, defined in Eq. 8, to compute the element-wise 
approximation to its directional derivative at collocation point (i,j, k): 


1 



( 11 ) 


(V • VT) ljk 


JV-fl 

E 

9=1 


2 2 2 
'Vx~f~DiqTqjh + Vy-jr-D jqTiq k + V z DkqTijq 
Li . Lo -L 3 


where L t is the size of the element in the i th coordinate direction, and D l3 is defined as ^(4)- 
The values of D are listed in Table 2. 


Dij 

J = 1 

7=2 

7 = 3 

7=4 

7 = 5 

i — 1 

- 5.0 

6.756502488724238 

- 2.666666666666667 

-D52 

-D51 

i = 2 

- 1.240990253030982 

0.0 

1.745743121887939 

-D42 

-D41 

i = 3 

0.375 

- 1.336584577695453 

0.0 

-D32 

-D31 

2 = 4 

- 0.2590097469690172 

0.7637626158259734 

-D23 

-D'22 

-D21 

i — 5 

0.5 

- 1.410164177942427 

-D13 

-D \ 2 

-D n 


Table 2: Derivatives D t j of one-dimensional elemental basis functions hj for N = 4 


3.2 Advancing the diffusion term 

The variational form of Eq. (4) is: 

r pn - HI r pn + 1 

(- s ,V) = (-eVT" +1 , V«), (12) 

where (•,•) represents the L 2 inner product, and v is a test function. 

We select for the basis of the discrete space of test functions and solutions a set of func- 
tions 14 = {u(p)/i}, each corresponding to exactly one grid point x p . The basis function cor- 
responding to x p is obtained by the residual gathering operation 8 T , applied to the elemental 
basis functions u(r,s,t)i jk — hi(r)hj(s)h k (t ) of all elements that contain x p . This operation 
guarantees continuity of the solution at element vertices and at conforming interfaces, as well 
as a minimum jump — in an integral sense — across non-conforming element interfaces [6]. 

Hence, we can compute elemental residual contributions &ij k for each collocation point in 
each element and compute the total residual at a grid point by applying the gather operation 6 T 
(see Section 2.3.2). We define: 

T _I n+l rpn+l 

* ijk = ( ^ u ljk ) - (— eVT n+1 , Vu ijk ) , (13) 

where Uij k is the elemental basis function for collocation point (i, j, k). 

GLL quadrature is used to calculate the integrals in Eq. 13. One-dimensional GEL quadra- 
ture of a function r is defined as follows: 

r l A r +1 

i = 1 

8 


£=-l 


( 14 ) 




GLL quadrature locations <f t and the corresponding weights p t are given in Table 3. Multi- 
dimensional versions of GLL quadrature are derived by applying Eq. 14 successively in the 
respective coordinate directions. 


i 

1 



2 

3 

4 

5 


-1.0 

-0.6546536707079771 

0.0 

0.6546536707079771 

1.0 

Pi 

1/10 

49/90 

32/45 

49/90 

1/10 


Table 3: GLL collocation points L with associated weights pi for N — 4. 


Substituting Eq. 8 into Eq. 13 and applying GLL quadrature yields: 


^ ijk — 


'N+lN+lN+l / ^ JV+1 


[_ 1 = 1 771 = 1 0=1 


EEEw( ^-( ) PmPo ^ ^ PqDqiDglSjijj S^o-p 


N + 1 


) PlPo y ^ Pq D qj D qm&il&ko~ > r 


9=1 

jV+1 


) PlPm y ^ PqDqkDqo&ilfojm T" 


9=1 

7V+1 jV+1 N+l 


PiPj Pk&il fijm&ko 
At 


rpn + \ 

* Imo 


f ^ JV-t-J. JV + 1. jv + i 

I X-v X^ *22 1^1 PiPjPk^iAjm^ko > 

^ (=1 m=l o=l ' 


(15) 


where \ J\ is proportional to the volume of the element, i.e. J J\ = kdaLx_ Upon gathering 
the residual contributions at the grid points and setting the total residual equal to zero, we 
symbolically obtain the following system of equations: 


6 T {A6T* +l - b) = 0, (16) 

where b = BT£ +1 . T c n+1 and T™ +1 signify the temperature at the collocation points after 
advancement of the convection term, and at the grid points after advancement of the diffusion 
term, respectively. The matrix A represents the assembled elemental discretization matrix of 
the term in square brackets in Eq. 15 , which is symmetric, positive definite. Obviously, 6 T A6 
is symmetric as well. Since the scatter matrix 8 is of full rank, 0 T A9 is also positive definite. B 
represents the assembled elemental discretization matrix of the term in curly braces in Eq. 15. 


3.3 Conjugate Gradient method 

We use the Preconditioned Conjugate Gradient (PCG) method to solve Eq. 16. 

The initial guess T 0 is based on T”" 1 ' 1 : To = o r T c n41 . The a T operator is applied to the 
solution on collocation points, T < T rl , to obtain T 0 on grid points. It is defined as follows: on 
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element interfaces the solution at a grid point is computed by averaging values (arithmetic 
mean) at the coincident collocation points of all neighboring_/zn<? elements. If a grid point is on 
a conforming interface, all neighboring elements are considered fine elements. At points not 
on element interfaces the grid point value is set to the value at the coincident collocation point. 

It should be noted that although both a T and 0 r constitute gather operations, the latter 
cannot be used to compute actual solution values. It is specifically derived to gather residuals, 
such that the solution of the resulting system of equations satisfies a minimum jump condition 
when the aggregate residual vanishes. This gather operation does not coiTespond to any kind 
of projection or averaging, and hence cannot be used to gather solution values. 

initialization: 

T 0 = a r T" +1 ; 

r 0 = 9 t BT? +1 - 9 T AOT 0 = 9 T (BT? +1 - A9T 0 ); 

<?o = p-Vo'; 

Po = Qo', 

iteration: 

— (?mbm)/(Pm)^ -A9p m ) , 

Tra+1 T rn + 0-mPm> 

An+1 An A.6p m , 

Qm + 1 = T T m +\, 

brn (*7m+ 1) t’m+l ) / {Qrni ^m)» 

Pm + 1 ~ 1 m + 1 T 

where m refers to iteration number, r m is the residual, p m the search direction, P is the precon- 
ditioner, q m is a vector associated with the preconditioning and (-,-) is the usual discrete inner 
product. Note that the dependent variable T m does not have to be updated within the PCG 
iterative loop. It is sufficient to accumulate the quantities a m p m at each grid point and scatter 
their sum to the collocation points after the PCG iterations have been completed, where it gets 
added to T c n+1 . The stop criterion for the iterations is given in Section 5. 

A diagonal preconditioner P is used. See the Appendix for how to compute it. 

P = Diag{9 T A9) . (17) 


4 Mesh adaptation 

The mesh adaptation is subject to two restrictions: 

1. the maximum number of levels of refinement is constant for each problem size (Class). 

2. the difference in size between any two elements that have a face or edge in common is at 
most a factor of eight (one level of refinement). 
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Figures 3 (a) and (b) show a valid and an invalid grid, respectively. In Fig. 3 (b), the 
refinement of element 14 results in a situation where elements 17, 20 and 21 have two levels of 
refinement difference with some of their neighboring elements. 


20 

21 

13 

14 

17 

i 

9 

i 

10 


20 

21 

13 

1 

4 ^ 

17 

9 

10 


(a) (b) 

Figure 3: (a) 2D front view of a valid 3D mesh, (b) Invalid mesh that violates criterion 2 


4.1 Refinement and coarsening 

Both refinement and coarsening are isotropic. For refinement, an element is split evenly in the 
x, y and z directions into eight child elements. For coarsening, the eight children are merged 
into one element, the parent. The coarsening is performed only when all eight child elements 
are identified to be coarsened. Both refinement and coarsening must obey the two adaptation 
criteria. If an element is marked for refinement (see Section 4.3), but can not be refined due 
to restriction 2, its neighboring elements which prohibit the refinement will be marked to be 
refined first. Any element not marked for refinement will be coarsened, if possible. 

If a refinement will result in a difference of more than one level of refinement between 
neighboring elements, additional refinements will be applied as needed until all required re- 
finements are obtained. For example, in the situation of Fig. 3 (b), elements 17, 20 and 21 will 
be refined first to allow the refinement of element 14. 

In Fig. 4, the coarsening of elements 61, 62, 64 and 65 may be desired. However, the 
coarsening will result in element 17 having two levels of refinement difference with element 
61, with which it shares an edge. Thus, the coarsening is cancelled. If all elements in Fig. 4 
are to be coarsened, this is accomplished in two steps. The merge of elements 12, 13, 16, and 
17 is performed first, then the other merge is performed. Therefore, no invalid grid will appear 
at any time. 
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Figure 4: 2-D front view of a 3-D mesh; coarsening of elements 61, 62, 64 and 65 alone is not 
allowed. 


4.2 Grid mapping 

After a new grid has been generated, the old solution must be transferred to it. This is ac- 
complished by mapping collocation point values from coarse to fine elements through multi- 
dimensional interpolation. It consists of sampling the solution T* at the collocation points of 
the finer elements using Eq. 8. For evaluation of function values at just one level of refine- 
ment difference we tabulate h^j), where hi is the t th interpolation polynomial pertaining to 
the coarser element ( i = 1, . . . , N ), and is the j th collocation point of the union of finer 
elements contained in the coarse element. See Table 4. That is, collocation points 1 through 
2 N + 1 are contained in the two adjacent fine elements that span the coarse element in a cer- 
tain coordinate direction, i.e. 1 through N + 1 in the first, and N + 1 through 2 N + 1 in the 
second. If interpolation across more than one level of refinement needs to be performed, the 
interpolation can be defined recursively, using Table 4 repeatedly, or directly, by sampling T at 
the appropriate collocation point locations. 


h’ij — 

i — 1 

i = 2 

i = 3 

i = 4 

i = 5 

3 = 1 

1.0 

0.0 

0.0 

^29 


3 = 2 

0.3385078435248143 

0.7898516348912331 

- 0.1884018684471238 

GO 

(N 

-C2 

his 

3 = 3 

- 0.1171875 

0.8840317166357952 

0.3125 

hi7 

hn 

j = 4 

- 7 . 065070066767 144 E -2 

0.2829703269782467 

0.9026875827328380 . 

h2e 

his 

3 = 5 

0.0 

0.0 

1.0 

h-25 

hi5 

j = 6 

4 . 98444258478 1999 E -2 

- 0.1648516348912333 

h$4 

h 24 

hu 

3=7 

0.0390625 

- 0.1184067166357950 

hzz 

^23 

hi 3 

3=8 

- 3 . 1987282990677 15 E -2 

9 . 202967302 175333 E -2 


hn 

hi2 

3 = 9 

0.0 

0.0 

h-a 

hn 

hn 


Table 4: Interpolation coefficients from coarse to fine elements (one level of refinement) 
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The mapping from fine to coarse elements is always defined recursively. The interpola- 
tion formula Eq. 8 for the fine element is used to evaluate T at the collocation points of the 
element at the next coarser level that fall within the fine element. For those coarse-element 
collocation points that are on the boundary between fine elements it does not matter which fine 
element contributes the interpolated values. The solution process guarantees that coincident 
fine-element collocation point solution values are identical at the end of each time step. We use 
a variation of Eq. 8 to compute the value of T c at the coarse-element collocation points from 
the fine-element values T * : 

2N+12N+12N+1 

T iok= E E E , i,j,k = i, . . . , iv , (18) 

o=l p= 1 q= 1 

where hi is one of 2 N + 1 interpolation polynomials of two adjacent fine elements w'ithin the 
coarse element, is the j th collocation point of the coarse element, and T( ik is the value of the 
temperature at collocation point (i,j, k ) of the union of all eight adjacent fine elements making 
up the coarse element. Since coincident collocation points at fine-element boundaries share the 
same value of T^, we count them as the same point in Eq. 18. We list the values of h/^j) in 
Table 5 below. 


hi(Ci) 

3 = 1 

3 = 2 

3= 3 

j = 4 

7=5 

i = 1 

1.0 

-0.1179652785083428 

0.0 

0.0 

0.0 

i = 2 

0.0 

0.5505046330389332 

0.0 

0.0 

0.0 

i — 3 

0.0 

0.7024534364259963 

0.0 

0.0 

0.0 

i — 4 

0.0 

-0.1972224518285866 

0.0 

0.0 

0.0 

i = 5 

0.0 

6.222966087 199998E-2 

1.0 

6.222966087 199998E-2 

0.0 

i = 6 

0.0 

0.0 

0.0 

-0.1972224518285866 

0.0 

i = 7 

0.0 

0.0 

0.0 

0.7024534364259963 

0.0 

i = 8 

0.0 

0.0 

0.0 

0.5505046330389332 

0.0 

i = 9 

0.0 

0.0 

0.0 

-0.1179652785083428 

1.0 


Table 5: Interpolation coefficients from fine to coarse elements 


4.3 Adaptation criteria, frequency 

Any element that has a nonzero overlap with the heat source gets refined. The refinement 
procedure is performed recursively until all elements that have nonzero overlap with the heat 
source have the minimal element size of that class. Overlap is considered nonzero if the point 
of the element closest to the center of the heat source is at a distance of less than a, where a is 
the radius of the heat source. Other elements are refined up to the minimum level required to 
satisfy mesh restriction 2 (page 10). Due to the movement of the heat source, some elements 
may no longer have overlap with the heat source. They will be coarsened recursively until ail 
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possible coarsenings are done, obeying restriction 2. The first adaptation takes place before 
computations start, i.e., before the first time step. Subsequently, after a certain prescribed 
number of time steps (differs for the different benchmark classes, see Table 6) the grid gets 
re-adapted using the same criteria as above. 


5 Verification 

The initial grid has one element, [0, l] 3 , which makes up the whole domain. The center of the 
heat source is located at x 0 = (f , f ) at the beginning of the computation, i.e. t = 0. It moves 
at a speed of v = (3, 3, 3). The size of the source is defined by a. We set the heat diffusion 
coefficient e to 0.005 for all cases. The correctness of the result is verified by the integral of 
the solution / n Tdfl at the end of the time stepping. It is computed as follows: 

„ Af+l N+ 1 N+ 1 

I Tdn = E EEE \J\pkPjPiTijk • (19) 

^ all elements k= 1 j= 1 2=1 

We list the parameters and verification values for all benchmark classes in Table 6. nt is the 
number of time steps, nl is the maximum number of levels of refinement, nt a is the number of 
time steps between adaptations, nPCG is the number of iterations for the PCG method. The 
size of the time step At is related to the maximum number of levels of refinement as follows: 
At ~ 0.04 * 2~ nl . 

For convenience we also list the number of elements, ne, at the end of the computation. 
A solution is considered incorrect if the absolute value of the difference between computed 
integral and verification value, divided by the verification value, exceeds the error threshold of 
10 " 8 . 


Class 

nt 

nt a 

nPCG 

nl 

a 

L TdQ 

ne 

S 

50 

5 

10 

4 

0.04 

1.8900131 10962E-3 

246 

W 

100 

5 

10 

5 

0.06 

2.569794837076E-5 

526 

A 

200 

5 

10 

6 

0.076 

8.93999628 1443E-5 

2038 

B 

200 

5 

10 

7 

0.076 

4.507561 922901E-5 

7841 

C 

200 

5 

10 

8 

0.067 

1.544736587100E-5 

31641 

D 

250 

5 

10 

10 

0.046 

1.577586272355E-6 

506297 


Table 6: UA verification values 
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6 Summary 

We now give an overview of the steps in the completion of the benchmark. 

1. refine the initial grid based on the refinement criteria. 

2. set the solution at all collocation points to zero. 

3. compute the source at all collocation points, using Eq. 2. 

4. advance the solution (convection term) at all collocation points, using Eqs. 5 and 11. 

5. set T 0 for PCG, using the operation a T , described in Section 3.3. 

6. advance the solution (diffusion term) at all grid points, using PCG described in Section 
3.3. 

(a) compute the element-wise residual contribution defined in Eq. 1 3. 

(b) collect the residual at the grid points, using the gather operation 6 T . 

(c) scale the residual, using diagonal preconditioning. 

(d) scatter the scaled residual to the collocation points. 

(e) apply the element-wise diffusion operator (Eq. 13) to the scaled residual. 

(f) gather the resulting residual increments at the grid points. 

(g) accumulate the temperature updates at grid points, 
thl comnufe a new residual vector 

(i) scale the residual. 

(j) compute a new search direction. 

(k) return to step 6e as long as the stopping criterion has not been satisfied. 

(l) scatter the temperature update to collocation points. 

7. if the step qualifies, refine/coarsen the grid based on the new location of the source and 
map the solution at the collocation points to the new set of collocation points. 

8. return to step 3 as long as the total number of time steps has not been exceeded. 
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Appendix: Simple method for computing the preconditioner 

The system to be solved in the diffusion step can be written as 0 T A6x = 0 T b, where A is the 
assembled discretization matrix. Therefore, the diagonal preconditioner P satisfies the relation 
P i J 1 = ( 9 T A6)ijSij . However, 6 T A6 is never computed explicitly, because it is inefficient to 
do so and because it would be too large to store. A straightforward way to obtain is to 
evaluate the matrix- vector products u t = (6 T A6)e t for as many unit vectors e t as there are grid 
points. The i th entry of u; is the i th diagonal element of (9 T AO). This procedure, which we 
call General, comprises three steps: 

1. Calculate Oei, the resulting vector is the i th column of 9. 

2. Calculate A9ei. Each row of A relates to one elemental collocation point. The sum 
over all elements A P j(9e t )j is th ep th entry of AAe*. A is a block diagonal matrix, 
each block corresponding to one element, ntot refers to the total number of collocation 
points. In conforming cases, each element of 9 is either zero or one, and, hence, so is 
each element of vector 0e;. The product A0e t is a vector, whose p th entry is nonzero iff 
( 9ei) p = 1, and has the value A pp in that case. 

3. Calculate (6 r A9ei). Since we only need the i th entry of the resulting vector (0 T A0e l ), 

we only need to perform the inner product of the i th row of 0 T with A9e t . Note that the 
i th row of 0 T is just the i th column of 6. In conforming cases, the resulting ( 6 T A0ei ) is 
vector Ui, whose i th entry is A pp , G t = {g\(0ei) g = 1}. 

In conforming cases, this procedure can be Simplified. 

1 FvnlnatP 4 Inrnllv \x/ithin 

— ‘-nun J •— *** - — 

2. Construct a vector a whose m th entry equals -4 mm . 

3. Perform a global stiffness summation on a: a = 6 T a. 

4. Preconditioner P is obtained from a: P~ l = a t . 

To solve the overall problem we use a hybrid approach. The SIMPLIFIED procedure is used 
to compute the preconditioner for grid points in the interior of elements and on conforming 
boundaries. The GENERAL procedure is used on nonconforming boundaries. 
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